{
 "cells": [
  {
   "cell_type": "markdown",
   "source": [
    "# Nebula Graph Indexing and Retrieval\n",
    "\n",
    "知识图谱已经在各个领域全面开花，如教育、医疗、司法、金融等。相比于无结构文本或半结构化的知识存储方式，\n",
    "知识图谱具备了更加清晰、全面和严谨的行业背景信息，从知识的完备性和专业性上来看更适合作为RAG系统的数据存储介质。\n",
    "\n",
    "\n",
    "> [NebulaGraph](https://docs.nebula-graph.com.cn/3.6.0/1.introduction/1.what-is-nebula-graph/) 是一款开源的、分布式的、易扩展的原生图数据库，能够承载包含数千亿个点和数万亿条边的超大规模数据集，并且提供毫秒级查询。NebulaGraph使用nGQL命令来查询图数据库\n",
    "> [nGQL](https://docs.nebula-graph.com.cn/3.6.0/3.ngql-guide/1.nGQL-overview/1.overview/)是 NebulaGraph 使用的的声明式图查询语言，支持灵活高效的图模式，而且 nGQL 是为开发和运维人员设计的类 SQL 查询语言，易于学习。\n",
    "\n",
    "本示例将使用真实的行业图数据库，演示如何通过大模型将用户的问题转译成数据库查询命令，并将查询结果作为上下文用于增强最终结果的生成"
   ],
   "metadata": {
    "collapsed": false
   },
   "id": "313af59daeb0bf42"
  },
  {
   "cell_type": "markdown",
   "source": [
    "## 安装部署\n",
    "\n",
    "Nebula Graph官方提供了多种[服务部署方法](https://docs.nebula-graph.com.cn/3.6.0/4.deployment-and-installation/1.resource-preparations/)，用户可以根据自己的计算资源进行选择。\n",
    "\n",
    "本示例选择通过TAR包的形式在本地安装服务"
   ],
   "metadata": {
    "collapsed": false
   },
   "id": "eb09daec54726da3"
  },
  {
   "cell_type": "markdown",
   "source": [
    "### 1. 安装Nebula Graph\n",
    "\n",
    "```shell\n",
    "# 下载指定版本的安装包\n",
    "wget https://oss-cdn.nebula-graph.com.cn/package/3.6.0/nebula-graph-3.6.0.ubuntu2004.amd64.tar.gz\n",
    "\n",
    "# 解压安装包\n",
    "tar -xvzf nebula-graph-3.6.0.ubuntu2004.amd64.tar.gz -C ~/nebulagraph/\n",
    "\n",
    "# 修改配置文件：将子目录etc中的文件nebula-graphd.conf.default、\n",
    "# nebula-metad.conf.default和nebula-storaged.conf.default重命名，\n",
    "# 删除.default，即可应用 NebulaGraph 的默认配置。\n",
    "\n",
    "# 进入scripts目录\n",
    "./nebula.service start all\t\t\t# 启动服务\n",
    "./nebula.service status all\t\t\t# 查看服务状态\n",
    "\n",
    "# 通过NebulaGraph Console 添加Storage主机\n",
    "# 下载地址：https://docs.nebula-graph.com.cn/3.6.0/4.deployment-and-installation/connect-to-nebula-graph/\n",
    "# 登录console\n",
    "./nebula-console-linux-amd64-v3.6.0 -addr 127.0.0.1 -port 9669 -u root -p nebula\n",
    "ADD HOSTS \"127.0.0.1\":9779;\n",
    "exit\n",
    "```"
   ],
   "metadata": {
    "collapsed": false
   },
   "id": "c2305f74c9c6f140"
  },
  {
   "cell_type": "markdown",
   "source": [
    "### 2. 安装NebulaGraph Studio\n",
    "\n",
    "```shell\n",
    "# 推荐安装生态工具nebula studio，后续创建&修改schema、查看数据详情会更加方便\n",
    "wget https://oss-cdn.nebula-graph.com.cn/nebula-graph-studio/3.8.0/nebula-graph-studio-3.8.0.x86_64.tar.gz\n",
    "tar -xvf nebula-graph-studio-3.8.0.x86_64.tar.gz\n",
    "\n",
    "#启动服务\n",
    "cd nebula-graph-studio\n",
    "./server\n",
    "\n",
    "# 在浏览器输入 http://{ip}:7001即可打开web界面\n",
    "```"
   ],
   "metadata": {
    "collapsed": false
   },
   "id": "6862a6abe628eccd"
  },
  {
   "cell_type": "markdown",
   "source": [
    "## 数据构建\n",
    "\n",
    "本示例使用医疗知识图谱数据搭建图数据库，数据来自：https://github.com/liuhuanyong/QASystemOnMedicalKG"
   ],
   "metadata": {
    "collapsed": false
   },
   "id": "c2eb8374295b912a"
  },
  {
   "cell_type": "markdown",
   "source": [
    "### 1. 创建Space & Schema\n",
    "\n",
    "可以在NebulaGraph Studio上完成该步操作，也可以在nebula-console中通过命令行完成创建：\n",
    "```shell \n",
    "./nebula-console-linux-amd64-v3.6.0 -addr 127.0.0.1 -port 9669 -u root -p nebula\n",
    "# Create Space \n",
    "CREATE SPACE `MedicaKG` (partition_num = 10, replica_factor = 1, charset = utf8, collate = utf8_bin, vid_type = FIXED_STRING(32));\n",
    "USE `MedicaKG`;\n",
    "\n",
    "# Create Tag: \n",
    "CREATE TAG `check` ( `name` string NULL COMMENT \"诊断检查项目名称\") ttl_duration = 0, ttl_col = \"\", comment = \"诊断检查项目\";\n",
    "CREATE TAG `department` ( `name` string NULL COMMENT \"医疗科室名称\") ttl_duration = 0, ttl_col = \"\", comment = \"医疗科室\";\n",
    "CREATE TAG `disease` ( `name` string NULL COMMENT \"疾病名称\", `cause` string NULL COMMENT \"疾病病因\", `prevent` string NULL COMMENT \"预防措施\", `cure_lasttime` string NULL COMMENT \"治疗周期\", `cure_way` string NULL COMMENT \"治疗方式\", `cured_prob` string NULL COMMENT \"治愈概率\", `easy_get` string NULL COMMENT \"疾病易感人群\", `description` string NULL COMMENT \"疾病描述\") ttl_duration = 0, ttl_col = \"\", comment = \"疾病\";\n",
    "CREATE TAG `drug` ( `name` string NULL COMMENT \"药品名称\") ttl_duration = 0, ttl_col = \"\", comment = \"药品\";\n",
    "CREATE TAG `food` ( `name` string NULL COMMENT \"食物名称\") ttl_duration = 0, ttl_col = \"\", comment = \"食物\";\n",
    "CREATE TAG `producer` ( `name` string NULL COMMENT \"在售药品名称\") ttl_duration = 0, ttl_col = \"\", comment = \"在售药品\";\n",
    "CREATE TAG `symptom` ( `name` string NULL COMMENT \"疾病症状名称\") ttl_duration = 0, ttl_col = \"\", comment = \"疾病症状\";\n",
    "\n",
    "# Create Edge: \n",
    "CREATE EDGE `department_belongs_to` () ttl_duration = 0, ttl_col = \"\", comment = \"医疗科室从属关系\";\n",
    "CREATE EDGE `disease_accompany_with` () ttl_duration = 0, ttl_col = \"\", comment = \"疾病并发疾病\";\n",
    "CREATE EDGE `disease_common_drug` () ttl_duration = 0, ttl_col = \"\", comment = \"疾病常用药品\";\n",
    "CREATE EDGE `disease_department` () ttl_duration = 0, ttl_col = \"\", comment = \"疾病所属治疗科室\";\n",
    "CREATE EDGE `disease_do_eat` () ttl_duration = 0, ttl_col = \"\", comment = \"疾病宜吃食物\";\n",
    "CREATE EDGE `disease_has_symptom` () ttl_duration = 0, ttl_col = \"\", comment = \"疾病症状\";\n",
    "CREATE EDGE `disease_no_eat` () ttl_duration = 0, ttl_col = \"\", comment = \"疾病忌吃食物\";\n",
    "CREATE EDGE `disease_recommend_drug` () ttl_duration = 0, ttl_col = \"\", comment = \"疾病推荐药品\";\n",
    "CREATE EDGE `disease_recommend_eat` () ttl_duration = 0, ttl_col = \"\", comment = \"疾病推荐食谱\";\n",
    "CREATE EDGE `drugs_of` () ttl_duration = 0, ttl_col = \"\", comment = \"厂商－药物关系\";\n",
    "\n",
    "# Create Index: \n",
    "CREATE TAG INDEX `disease_index` ON `disease` ( `name`(50));\n",
    "CREATE EDGE INDEX `department_belongs_to_index` ON `department_belongs_to` ();\n",
    "CREATE EDGE INDEX `disease_accompany_with_index` ON `disease_accompany_with` ();\n",
    "CREATE EDGE INDEX `disease_common_drug` ON `disease_common_drug` ();\n",
    "CREATE EDGE INDEX `disease_department_index` ON `disease_department` ();\n",
    "CREATE EDGE INDEX `disease_do_eat` ON `disease_do_eat` ();\n",
    "CREATE EDGE INDEX `disease_has_symptom` ON `disease_has_symptom` ();\n",
    "CREATE EDGE INDEX `disease_no_eat` ON `disease_no_eat` ();\n",
    "CREATE EDGE INDEX `disease_recommend_eat` ON `disease_recommend_eat` ();\n",
    "CREATE EDGE INDEX `disease_recomment_drug` ON `disease_recommend_drug` ();\n",
    "CREATE EDGE INDEX `drugs_of_index` ON `drugs_of` ();\n",
    "```"
   ],
   "metadata": {
    "collapsed": false
   },
   "id": "2e409abedb660642"
  },
  {
   "cell_type": "markdown",
   "source": [
    "### 2. 导入数据\n",
    "\n",
    "NebulaGraph 提供了多种导入数据的方法，本示例选择通过nebula3-python客户端连接图数据库并导入数据，代码可参考[脚本](import_data.py)\n",
    "```shell\n",
    "# 执行以下命令导入数据\n",
    "python import_data.py --ip xxxxxxxxx --port xxxx --file_path xxxxxx --space xxxxxxx\n",
    "```"
   ],
   "metadata": {
    "collapsed": false
   },
   "id": "99d03071c0275353"
  },
  {
   "cell_type": "markdown",
   "source": [
    "## 查询数据库"
   ],
   "metadata": {
    "collapsed": false
   },
   "id": "84c33d794aabe9e"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "import os\n",
    "os.chdir(os.path.dirname(os.path.dirname(os.getcwd())))\n",
    "\n",
    "from rag.connector.knowledge_graph.nebula_graph import NebulaKnowledgeGraph\n",
    "from langchain_openai.chat_models import ChatOpenAI\n",
    "from langchain_core.prompts.prompt import PromptTemplate\n",
    "\n",
    "openai_api_key = \"\"\n",
    "llm = ChatOpenAI(\n",
    "    model_name=\"gpt-4\",\n",
    "    openai_api_key=openai_api_key\n",
    ")\n",
    "\n",
    "# 建议根据实际的图数据对prompt进行优化\n",
    "# 例如增加查询示例可以大幅提升nGQL命令的生成效率\n",
    "\n",
    "NGQL_GENERATION_TEMPLATE = \"\"\"<指令> 你是一位出色的AI助手，擅长将用户的问题转化成图数据库查询语句。\n",
    "现在你的任务是根据用户输入的问题，生成nGQL命令，用于查询NebulaGraph图数据库。\n",
    "只能使用提供的图数据节点、关系类型和属性信息，不允许使用除此之外的其它信息。</指令>\n",
    "图结构: {schema} \n",
    "\n",
    "注意事项: 不要在回复中包含任何解释和抱歉信息。除了构建nGQL命令之外，请勿回答任何可能提出其他问题的问题。除了生成的nGQL命令外，请勿包含任何文本。\n",
    "下面有几个例子，供你参考：\n",
    "\n",
    "# 示例1：\n",
    "问题: 支气管炎的预防措施有哪些？\n",
    "nGQL命令: match (v:disease) where v.disease.name==\"支气管炎\" return v.disease.prevent;\n",
    "\n",
    "# 示例2：\n",
    "问题: 得了支气管炎可以吃哪些药？\n",
    "nGQL命令: match (v1:disease)-[e:disease_recommend_drug]->(v2:drug) where v.disease.name==\"支气管炎\" return v2.drug.name;\n",
    "\n",
    "问题: {question}\n",
    "nGQL命令: \"\"\"\n",
    "\n",
    "NGQL_GENERATION_PROMPT = PromptTemplate(\n",
    "    template=NGQL_GENERATION_TEMPLATE, input_variables=[\"schema\", \"question\"]\n",
    ")\n",
    "nebula_graph = NebulaKnowledgeGraph(llm, ngql_prompt=NGQL_GENERATION_PROMPT)\n",
    "print(nebula_graph.graph_schema)\n",
    "\n",
    "question = \"病毒性肠炎有哪些并发症？\"\n",
    "docs = nebula_graph.call(question)\n",
    "print(docs)"
   ],
   "metadata": {
    "collapsed": false
   },
   "id": "d33d917d43e77e84"
  },
  {
   "cell_type": "markdown",
   "source": [
    "## LLM生成"
   ],
   "metadata": {
    "collapsed": false
   },
   "id": "340f1f4d4b3f5669"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "import os\n",
    "os.chdir(os.path.dirname(os.path.dirname(os.getcwd())))\n",
    "\n",
    "from rag.chains.generate import GenerateChain\n",
    "\n",
    "generate_chain = GenerateChain(llm)\n",
    "for ans in generate_chain.chain(query=question, docs=docs, history=[]):\n",
    "    print(ans)"
   ],
   "metadata": {
    "collapsed": false
   },
   "id": "680ac38511ad564"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [],
   "metadata": {
    "collapsed": false
   },
   "id": "2bbe5f53b9a65126"
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
